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I. INTRODUCTION 

In the standard model of particle physics (SM) [l[ , the 
mechanism of electroweak symmetry breaking generates 
a massive scalar boson called the Higgs boson {H) 0]. 
Over the last few decades there has been an intensive ef- 
fort to uncover experimental evidence of the existence of 
the Higgs boson. Recently, the CMS and ATLAS collab- 
orations reported the observation of a new boson with a 
mass of approximately 125 GeV/c^ [1]. While the pro- 
duction and decay of this particle are consistent with 
expectations for the SM Higgs boson, many of its proper- 
ties have yet to be established. In particular, the relative 
coupling strengths of this boson to quarks, leptons, and 
other bosons are important in understanding whether it 
is the SM Higgs boson or another state. While the sen- 
sitivities of the CMS and ATLAS analyses were primar- 
ily influenced by decays of this particle into Z bosons, 
W bosons, and photons, the sensitivity of the low-mass 
Higgs boson analyses of the CDF and DO collaborations 
is largely from decays to pairs of b quarks. Recent results 
from CDF and DO show evidence of an excess of events 
consistent with a 125 GeV/c^ SM Higgs boson decaying 
to b quarks [J. However, it is not yet known if this ex- 
cess can be attributed to the same particle observed by 
the ATLAS and CMS collaborations and further investi- 
gation is warranted. 

In the SM, the dominant decay channel for a low-mass 
Higgs boson {mn < 135 GeV/c^) is to the bb final state. 
At the Tcvatron, pairs of b quarks arc produced via the 
strong interaction ("QCD multijet" background) with a 
cross section much larger than that predicted for Higgs 
boson production followed by — > 56 decay. Search- 
ing for direct Higgs boson production is, therefore, very 
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difficult and far less sensitive than searching for it in pro- 
cesses where the SM Higgs boson is produced in associa- 
tion with a weak vector boson V (where V represents the 
W 01 Z boson). The leptonic decay of the vector boson 
provides a distinct signature, enabling significant sup- 
pression of QCD multijet events. Furthermore, selecting 
events in which jets are identified as being consistent with 
the fragmentation of b quarks ( "fe tagging" ) additionally 
improves the signal-to-background ratio in low-mass SM 
Higgs boson searches. 

One of the most sensitive SM Higgs boson search chan- 
nels at the Tevatron is the VH — > + bb final state, 
where M^t represents the missing tranverse energy result- 
ing from neutrinos or unidentified charged leptons in the 
event. This article reports an update to the previous 
CDF analysis in the M^t + bb search channel [H ; the same 
data are analyzed, but the 6-tagging strategy is signifi- 
cantly improved. The complete + bb analysis method 
has been described previously Q and will only be briefly 
reviewed. The data correspond to an integrated luminos- 
ity of 9.45 fb~^, collected in proton-antiproton collisions 
at a center-of-mass energy of ^/s — 1.96 TeV. 



II. CDF DETECTOR AND EVENT SELECTION 

The CDF II detector is described in detail else- 
where 0, 01- It features a cylindrical silicon detector 
and drift wire tracking system inside a superconduct- 
ing solenoid, surrounded by projective calorimeters and 
muon detectors. Calorimeter energy deposits are clus- 
tered into jets using a cone algorithm with an opening 
angle of AR = ^(A0)2 + {Atj)^ = 0.4 g. High-pr 
electron candidates are identified by matching charged- 
particle tracks in the inner tracking systems [T^] with 
energy deposits in the electromagnetic calorimeters 
Muon candidates are identified by matching tracks with 
muon-detector track segments jl2| . The hermeticity of 
the calorimeter in the pseudorapidity range \r]\ < 2.4 
provides reliable reconstruction of the missing transverse 
energy 

Events are selected during online data taking if they 
contain either ^T(cal) > 45 GeV, or ^rical) > 35 GeV 
and at least two jets. In the analysis, we further require 
that events contain no identified electron or muon, and 
> 35 GeV after corrections for instrumental effects in 
jet reconstruction are applied [1] . The two jets of greatest 
Et in the event are required to have transverse energies 
that satisfy 25 < E^j!: < 200 GeV and 20 < -B^' < 120 
GcV, respectively, according to a jet-energy determina- 
tion based on calorimeter deposits and track momentum 
measurements [l^ . This selects candidate events consis- 
tent with the ZH — >■ i^Dbb process. Because r leptons 
are not explicitly reconstructed and some electrons and 
muons escape detection or reconstruction, events from 
the WH — >■ ivbb process are also expected to contribute 
significantly. To gain sensitivity in events with an uniden- 
tified T lepton, we therefore also accept events where 



the third-most energetic jet satisfies 15 < E^ < 100 
GeV. We reject events with four reconstructed jets, where 
each jet exceeds the minimum transverse energy thresh- 
old {Et > 15 GeV) and has pseudorapidity \ri\ < 2.4. 
To reduce contamination from QCD multijet events that 
exhibit generated via jet mismeasuremcnt, the angles 
between the and the directions of the second and (if 
present) third jets are required to be greater than 0.4 
radians. To ensure that both leading- jets are recon- 
structed within the silicon detector acceptance, they are 
required to satisfy jryj < 2, where at least one of them 
must satisfy < 0.9. The QCD multijet background 
is additionally reduced by 35% using a neural-network 
regression algorithm that incorporates electromagnetic- 
and hadronic-calorimeter quantities to account for jet- 
energy mismcasurements. 



III. 6-JET IDENTIFICATION ALGORITHM 

This analysis employs a multivariate 6-tagging al- 
gorithm (hobit) specifically optimized for H ^ bb 
searches [l6| . The algorithm incorporates quantities from 
various CDF 6-tagging algorithms as input variables, and 
it assigns an output value v to each jet based on the prob- 
ability that the jet originates from the fragmentation of 
a b quark. Jets initiated by b quarks tend to cluster at 
values close to 1, whereas those initiated by light- flavor 
quarks are more likely to populate the region near — 1. 
Two operating regions are used: jets with v > 0.98 are 
considered to be tightly tagged (T), whereas jets with 
0.72 < V < 0.98 are loosely tagged (L). Analogous to 
the previous analysis, we accept events assigned to one 
of three categories based on the tag quality of the two 
leading- i?T jets: both jets are tightly tagged (TT); one 
jet is tightly tagged, and the other loosely tagged (TL); 
and only one jet is tightly tagged (IT). The tag categories 
used in both analyses and the associated tagging efficien- 
cies of Higgs boson signal events are given in Table HI As 
can be seen, the HOBiT algorithm achieves a 32% (11%) 
relative improvement in the tagging efficiency of signal 
events into the double-tight (tight-loose) category. The 



TABLE I: Comparison of b-tagging efficiencies per signal 
event in the tag categories of this analysis and the previous 
one Jets tagged by the SECVTX 6-tagging algorithm are 
labeled "S" , and those that are tagged by the jetprob algo- 
rithm but not SECVTX are labeled "J". There is no overlap 
between the tag categories of a given analysis by design. 

^ b-tagging efficiency per event 
Tag category 

Ref. [5] This analysis 

Two tight b tags 13.7% (SS) 18.1% (TT) 

One tight and one loose b tag 13.1% (SJ) 14.6% (TL) 

Only one tight b tag 31.4% (IS) 31.6% (IT) 
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preselection sample consists of events that satisfy all of 
the above selection criteria. 



IV. QCD MULTIJET BACKGROUND MODEL 

In the preselection sample, the dominant background 
to the Higgs boson signal is still that of QCD multijct 
production. Other non-neglible backgrounds are those 
from singly- and pair-produced top quarks ("top"), V- 
plus- heavy-flavor jets, diboson production (VV), and jets 
from electroweak processes that are incorrectly tagged as 
b jets ("electroweak mistags"). The modeling of each 
background is described in Ref. Q. A QCD multijct 
background model is derived by looking at data events 
in a control region where < 70 GcV and the an- 
gle between the and second jet is less than 0.4 ra- 
dians. The sample of events that satisfy these criteria 
consists almost entirely of QCD multijct contributions. 
For tag category i (where i = IT, TL, or TT), a mul- 
tivariable probability density function /; is formed by 
taking the ratio between tagged and pretagged events as 
a function of several variables. Four of those variables 
are the same as in Ref. [H : the scalar sum of jet trans- 
verse energies Ht, the missing track transverse momen- 
tum of the event , and the charge fractions i^^Prl Et , 
where the sum is over the tracks within the jet cone) of 
the first- and second-most energetic jets. To improve 
the modeling of the QCD multijet background, we in- 
clude two more parameters in the probability density 
function: the number of reconstructed vertices in the 
event, which is correlated with the topological variables 
used in the multivariate discriminants (see Sec. |V|; and 
= p^i sin(/ii,ji) -I- p^2sin(/i2,j2), where p^^ repre- 
sents the momentum of the most energetic muon (if one 
exists) within the cone of jet i, and sin(/ij, ji) is the sine 
of the angle between the muon and jet directions. The p^^ 
variable tends to be large for jets in which the initiating 
6 quark decays semileptonically through b — >■ ciu. 

A QCD multijet model is determined for each of the 
IT, TL, and TT categories by weighting the untagged 
data in the preselection sample according to the /it, 
/tl, and /tt probability density functions, respectively. 
To determine the appropriate normalization for a given 
category, the tagged VV , top, T^-plus-hcavy-flavor, and 
electroweak mistag background estimates are subtracted 
from the tagged data, and the multijet prediction is 
scaled to that difference. To validate the background 
modeling, we compare tagged data and the correspond- 
ing combined background prediction in multiple control 
regions 



17| for various kinematic, angular, and event- 



shape variables, which are included later on as inputs to 
multivariate discriminants that separate signal and back- 
ground processes. Shown in Fig. [1] are data-modeling 
comparisons of all tagged events in the preselection sam- 
ple for the invariant dijet mass (kinematic) , the angle be- 
tween the Bt and directions A0(_^t,#t) (angular), 



-Data QQCD Multijet DTop 
□ V + HF DEWK Mistags Dvv 
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FIG. 1: Validation of the background model for all tagged 
events in the preselection sample for (a) the invariant mass of 
the two leading jets, (b) the angle between the Bt and ^t, 
and (c) the sphericity of the jets in the event. 



and jet sphericity (event shape) [l8| variables. The good 
agreement found in each distribution is representative of 
all variables included in the neural-network discriminants 
described below. 
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FIG. 2: The distribution of the NNqcd discriminant for 
tagged data events in the preselection sample in comparison 
with modeled background expectations. 



V. MULTIVARIATE DISCRIMINANTS 

To optimally separate Higgs boson signal from back- 
ground, a staged multivariate approach is used. A first 
neural network NNqcd is trained to discriminate be- 
tween QCD multijct and signal processes. Events that 
satisfy a minimum NNqcd threshold requirement are 
subjected to a second neural network NNsiC: designed to 
separate the signal from the remaining SM backgrounds. 

The NNqcd discriminant is trained using equal event 
yields of QCD multijet-modeled background and VH sig- 
nal processes. As in the previous analysis, the collection 
of input variables to the NNqcd algorithm includes kine- 
matic, angular, and event-shape quantities 0, [l^, each 
of which is validated with tagged data in the preselec- 
tion sample. Figure [2] shows the NNqcd distribution 
for tagged events satisfying the preselection criteria. By 
imposing a minimum NNqcd requirement of 0.6 (which 
defines the signal region)^ 87% of the signal is retained 
while 90% of the QCD multijet background is rejected. 
Table HIl shows the expected number of signal and back- 
ground events and the observed data events in the signal 
region. For a Higgs boson mass of 125 GeV/c^, we ex- 
pect 19 signal events in the IT category and roughly 11 
signal events in both the TL and TT categories. 

Although the current and previous analyses use the 
same data set, the selected event samples used are only 
partially correlated due to updates to the 6-tagging algo- 
rithm and the NNqcd discriminant. Table Hill shows the 
predicted fractions of overlapping signal events between 
the tag categories of the previous analysis and those of 
this one. As can be seen, only 61% of the TT-tagged 
signal events in this analysis were present in the SS tag 
category of the previous analysis. The remaining 39% 
were classified as SJ events (23%), IS events (11%), or 
were not analyzed (6%) due to either not being tagged 
or not surviving the minimum NNqcd threshold require- 
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NNsiQ(m^ = 125 GeV/c^) 




FIG. 3: The distributions of tagged data events and the corre- 
sponding expected backgrounds for the NNsig discriminant 
functions after fitting to data for an assumed Higgs boson 
mass of 125 GeV/c^. Panel (a) shows IT events, (b) shows TL 
events, and (c) shows the NNsig discriminant for TT events. 
The signal contribution {"VH") assumes a Higgs boson mass 
of 125 GeV/c^ and is multiphed by a factor of ten (left un- 
sealed in insets) for illustrative purposes. Shown in the inset 
is a semilogarithmic version of the same NNsio distribution 
for events with NNsig > 0.8. 
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TABLE II: Comparison of the number of expected and observed events in the signal region for different 6-tagging categories. 
The uncertainties shown include systematic contributions and (when appropriate) statistical uncertainties on the simulation 
samples, added in quadrature for a given process. The quoted uncertainties for the total expected background prediction take 
into account the appropriate correlations among the systematic uncertainties for each background process. Signal contributions 
are given for an assumed Higgs boson mass of 125 GeV/c^. 





1 T 


TT 


TT 


QCD multijet 


5941 ± 178 


637 ± 25 


222 ± 16 


Top 


1174 ± 158 


302 ± 40 


271 ± 34 


V + heavy flavor jets 


3124 ± 718 


286 ± 83 


211 ± 65 


Electroweak mistags 


1070 ± 386 


55 ± 21 


13 ± 6 


Diboson 


305 ± 46 


48 ± 6 


41 ± 5 


Total expected background 


11612 ± 949 


1329 ± 112 


759 ± 86 


Observed data 


11955 


1443 


692 


ZH vuhb, ml 


9.7 ± 1.0 


5.4 ± 0.5 


5.4 ± 0.5 


WH Ivbh 


9.8 ± 1.0 


5.3 ± 0.5 


5.3 ± 0.5 



TABLE III: Predicted fractions of overlapping signal events 
between the previous analysis and this one. The "OT/OS" 
categories represent events that do not survive the tagging 
or signal-region definition criteria. Roman-font (italicized) 
numbers represent percentages of overlapping events relative 
to this (the previous) analysis the sum of the percent- 
ages in each column (row) is 100%. A Higgs boson mass of 
125 GeV /c? is assumed. 



OT IT TL TT 



OS 






22% 




19% 




6% 


IS 


17% — 


63% 


67% 


15% 


31% 


6% 


11% 


SJ 


12% — 


20% 


9% 


37% 


35% 


32% 


23% 


ss 


5% — 


3% 


1% 


15% 


15% 


77% 


61% 



mcnt. A significant portion of TT signal events is there- 
fore different from the sample of SS events in the previous 
analysis. The percentage of TT data events in this anal- 
ysis also present in the SS category of the previous one 
is approximately 50%. 

The NNgiG discriminant functions trained in the pre- 
vious analysis Q are well modeled in the analogous HO- 
BIT categories and also provide good separation of sig- 
nal and background events; they were thus retained for 
this analysis. The NNsig discriminant accepts kinematic 
and angular quantities as input variables, as well as the 
NNqcd value and a neural-network output that attempts 
to disentangle intrinsic from instrumental by us- 
ing tracking information [l9[ . The modeling of each input 
variable is validated with tagged data in the signal region. 
Figure [3] shows the NNsig distribution in the signal re- 
gion (NNqcd > 0.6) for the IT, TL, and TT events after 
the discriminants from all tag categories were jointly fit- 
ted to data. 



VI. RESULTS 

We perform a binned likelihood fit to search for the 
presence of a Higgs boson signal. A combined likelihood 
is formed from the product of Poisson probabilities of the 
event yield in each bin of the NNsig distribution for each 
tag category. Systematic uncertainties are treated as nui- 
sance parameters and incorporated into the limit by as- 
suming Gaussian prior probabilities, centered at the nom- 
inal value of the nuisance parameter, with an RMS width 
equal to the absolute value of the uncertainty. The dom- 
inant systematic uncertainties arise from the normaliza- 
tion of the y-plus-heavy-flavor background contributions 
(30%), differences in 6-tagging efficiencies between data 
and simulation (8-16%) [16|, uncertainty on the top (6.5- 
10%) and diboson (6%) cross sections [10, HH, normal- 
izations of the QCD multijet background (3-7%), lumi- 
nosity determination (6%) [l^], jet-energy scale (6%) [1], 
trigger efficiency (1-3%), parton distribution functions 
(2%), and lepton vetoes (2%). Additional uncertain- 
ties applied only to signal include those on the Higgs 
boson production cross section (5%) [2^ and on initial- 
and final-state radiation effects (2%). Also included are 
uncertainties in the NNsig shape, which arise primar- 
ily from variations in the jct-cncrgy scale and the QCD 
multijet background model. 

A Bayesian likelihood method is used to set 95% cred- 
ibility level (C.L.) upper limits on the SM Higgs bo- 
son production cross section times branching fraction 
a{VH) X B{H hh). For the signal hypothesis, a flat, 
non-negative prior probability is assumed for the num- 
ber of selected Higgs boson events. The Gaussian priors 
of the nuisance parameters are truncated at zero to en- 
sure non-negative event yield predictions in each NNsig 
bin. The 95% C.L. limits for the observed data and 
the median-expected outcomes assuming only SM back- 
grounds are shown in Fig. 2] and Table IIVI An average 
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FIG. 4: Observed and expected (median, for the background- 
only hypothesis) 95% C.L. upper limits on VH cross section 
times B{H — >■ bb) divided by the SM prediction, as a function 
of the Higgs boson mass. The bands indicate the 68% and 
95% credibility regions where the limits can fluctuate, in the 
absence of signal. 

improvement of 14% is obtained in expected upper limits 
relative to the previous analysis Q. The observed limits 
lie below the expected values at the level of roughly one 
standard deviation for mn > 120 GeV/c^, and at the 
level of approximately two standard deviations for lower 
Higgs boson masses. In constrast, the observed limits of 
the previous analysis exceed the median-expected limits 
by roughly one standard deviation for niH > 120 GcV/c'^ 
and are in approximate agreement with expected limits 
for lower masses. These differences correspond to a de- 
crease of roughly 55% in the observed limits relative to 
those of the previous analysis independent of wih ■ 

VII. DISCUSSION OF RESULTS 

We have investigated potential causes for the sizable 
shift in the observed limits. To quantify the impact of 
changes to the analysis design and treatment of system- 
atic uncertainties, we reanalyze the data sample using the 
IS, SJ, and SS categories used in the previous analysis 
fSec. lVII~S]) . We also study the effects from other sources 
that can influence the observed limits (Sec. I VII B[) . A 
summary of the discussion is given in Sec. I VII CI 

A. Reanalysis using IS, SJ, and SS tagging 
categories 

Besides the change in 6-tagging method, there arc 
other less significant changes made in this analysis with 
respect to the previous one: 

1. The 5-tag scale factors and their associated uncer- 
tainties are now handled with an improved treat- 
ment of the correlations between tag categories. 



FIG. 5: Observed and expected (median, for the background- 
only hypothesis) 95% C.L. upper limits on Higgs production 
in the previous analysis and those of the S-J reanalysis de- 
scribed in Sec. IVII Al The darker (black) set of lines represent 
the observed and expected limits from the previous analysis, 
whereas the lighter set (red) represent those of the S-J re- 
analysis. The 68% and 95% credibility regions are those of 
Ref. [|. 

2. Instead of treating the normalization uncertainties 
of all l/-plus-heavy-flavor samples as fully corre- 
lated, the y-plus-heavy-flavor samples are grouped 
according to flavor content of the final state, with 
each group receiving a 30% uncertainty. The uncer- 
tainties associated with each y-plus-heavy-flavor 
group are treated as uncorrelated with one another. 

3. An additional 6t > 35 GeV requirement is made 
that corresponds to the trigger-level reconstructed 

value. This has the effect of further reducing 
the QCD multijet background at the few percent 
level. 

4. As mentioned in Sec. |lTl upper limits are imposed 
on jet transverse energies. This is done to avoid 
a kinematic region susceptible to significant false- 
positive tagging rates for the hobit algorithm. 

5. An additional Z-plus-jcts sample is included where 
the Z boson decays to a bh pair. The change in 
overall expected yields due to this additional sam- 
ple is very small as the here is instrumental. 

To estimate the effect of these changes on the limits, we 
reanalyze the same data sample using the IS, SJ, and 
SS tagging categories of the previous analysis. For this 
test, hereafter referred to as the S-J reanalysis, we re- 
tain the NNqcd discriminant of the previous analysis so 
that the signal region definitions of this test and that 
of the previous analysis are the same. The results are 
shown in Fig. [5l As can be seen, the expected lim- 
its of Ref. [5| and the S-J reanalysis are in very good 
agreement. The observed limits of the S-J reanalysis are 
systematically lower than the observed limits of Ref. Q 
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TABLE IV: Expected and observed 95% C.L. upper limits on the VH cross section times B{H — > bb) divided by the SM 
prediction [2^ . 



ruH (GeV/c^) 


90 


95 


100 


105 


110 


115 


120 


125 


130 


135 


140 


145 


150 


Expected 


1.57 


1.83 


1.96 


2.08 


2.16 


2.48 


2.80 


3.33 


4.13 


5.26 


6.93 


9.91 


15.55 


Observed 


0.72 


0.94 


0.94 


0.91 


1.32 


1.53 


1.94 


3.06 


2.95 


3.49 


5.35 


6.69 


11.82 



with an average difference of —5% for niH < 120 GeV/c^ 
and —17% for niH > 120 GeV/c^. For comparison, we 
note that the observed Hmit for the analysis described 
in this note is 47% lower than that of the S-J reanalysis 
at mn = 125 GeV/c^. The analysis changes described 
here thus account for a non-negligible percentage of the 
sizable shift in the observed limits. 

We have also investigated the impact of these changes 
on previously published combined CDF _ff — > 66 lim- 
its |25|. The NNsiG discriminants of the S-J reanalysis, 



and the updated treatment of systematic uncertainties, 
are combined with the discriminants of the CDF £vbb and 
Mbb analyses [H, [13 to obtain an updated CDF H ^ bb 
result. Using the discriminants of the S-J reanalysis, the 
local significance of the CDF-combined excess at a Higgs 
boson mass of 125 GeV/c^ is recalculated. Within the 
statistical precision of the calculation, the local signifi- 
cance is unchanged at 2.7 standard deviations with re- 
spect to the background-only hypothesis. 



B. Additional cross-checks 

1. Systematic effects from b-tagging 

Since switching to a new 6-tagging algorithm is the 
most significant change adopted for this analysis, it is 
important to ensure that the performance of the HOBIT 
algorithm is well understood and well modeled. As with 
other 6-tagging algorithms, systematic effects associated 
with using HOBIT arc taken into account by correcting 
the simulation for differences in 6-tagging behavior be- 
tween data and simulation. Two methods are used to 
calibrate the simulation, both of which have been used 
extensively at CDF: one where the ti cross section is fixed 
to its theoretical prediction, and scale factors are derived 
that correct the simulation to the 6-tag and mistag effi- 
ciencies measured in data; and another where heavy- and 
light-flavor jets are identified with and without electron 
conversions within thern, allowing for a determination of 
the same scale factors [Ig. As both methods give con- 
sistent results for the hobit scale factors at both T and 
L operating points, they are averaged together, result- 
ing in 6-tag efficiency scale factors of 0.915 ± 0.035 (T) 
and 0.993 ± 0.035 (L) and mistag efficiency scale fac- 
tors of 1.50 ± 0.031 (T) and 1.33 ± 0.015 (L), where the 
dominant contributions to the uncertainties are from the 
theoretical uncertainty on the tt cross section [2^ . The 
variation of these scale factors with respect to several 



variables (e.g., jet energies and instantaneous luminosity) 
has been investigated, and any sizable deviations relative 
to the central predictions are included in the systematic 
uncertainties. These scale factors and their associated 
uncertainties have been propagated through this analy- 
sis in a manner consistent with the treatment of 6-tag 
and mistag scale factors in the other H ^ bb CDF anal- 
yses Hi, [231. 

To verify that the choice of 6-tagging algorithm docs 
not result in mismodeling within the high-score regions 
of the NNgiG distributions, we validate the background 
model with the data in an electroweak control sample. 
For this control sample we require, in addition to the 
preselection sample criteria, the presence of at least one 
identified, isolated electron or muon with a minimum 
transverse momentum of 20 GeV/c in the event. The 
electroweak sample is dominated by backgrounds that 
are modeled by simulation and not the QCD multijet 
background, whose model is derived from data. Figure [H] 
shows the NNsig distributions for TT and reanalyzed SS 
events in the electroweak control region. As can be seen, 
there is no obvious difference in the simulation model- 
ing of the NNgiG discriminants for the hobit or SECVTX 
algorithms. Comparisons in the IT-IS and TL-SJ cate- 
gories give similar conclusions. 



2. Effects of statistical fluctuations 

The expected limits are most significantly impacted by 
the bins of the discriminants with the highest signal-to- 
background ratios. For the NNsig distributions, these 
are the bins with the highest NNgiG values, as can be 
seen in Fig. [3] Because these bins tend to contain only 
small numbers of data events, the observed limits are 
susceptible to statistical fluctuations. Although we do 
not know if the data events are from signal or background 
processes, we explore how a fluctuation of yields from 
either type of process would manifest itself in the NNsig 
distributions. As part of the shift in observed limits is 
due to the analysis changes mentioned in Sec. IVII Al the 
yields quoted below for the SS and SJ results reflect those 
of the S-J reanalysis and not those of Ref. Q . 

As shown in Table IIIIl we expect significant signal 
event migrations between the tag categories of the pre- 
vious analysis and those of this one. Consequently, if a 
Higgs boson signal is present, we may observe some very 
high NNsig score events in one version of the analysis 
that either migrate to another tag category or do not ap- 
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TABLE V: Percentages of overlapping events between tag cat- 
egories of this analysis and the previous one for data events 
with NNsiG values greater than 0.8. 



NN3,3(m^ = 125 GeV/c^) 




125 GeV/c"^) 



FIG. 6: Validation of the background model for (a) TT 
events and (b) reanalyzed SS events in the electroweak control 
region. 



pear within the other analysis. Since the impact of these 
high-score events on the observed limits can be signifi- 
cant, the migration of a few signal-like events between 
tag categories in the S-J reanalysis and the current anal- 
ysis can lead to non-negligible changes in observed limits 
relative to expectations. Focusing on discriminant out- 
puts for the 125 GeV/c^ Higgs boson mass hypothesis, 
we compare data events in the very highest-score NNsig 
bins of both analyses and find one potential example for 
this type of event migration. In particular, we observe 
three events with NNsig values above 0.9 in the SJ cat- 
egory that are not present in any tag category of the 
current analysis (the new tagging algorithm categorizes 
two of these events as LL and the other as IL). If these 
three data events were to be simply added back into the 
TL category of the new analysis, the decrease in the ob- 
served limits at mn = 125 GeV/c^ with respect to those 
of the S-J reanalysis would be reduced from 47% to 31%. 

The number of expected background events in the 
high-score region of the NNsio discrimimants is also 
small and therefore an additional source of potential sta- 
tistical fluctuations in the data that might significantly 
impact the observed limits. We check for a potential 





IT 


TL 


TT 


IS 


55% 


35% 


15% 


SJ 


4% 


20% 


30% 


SS 


1% 


14% 


51% 



effect from background event fluctuations on the differ- 
ence between observed limits of the mn = 125 GeV/c^ 
searches by comparing the number of observed events 
that satisfy NNsig > 0.8 to the fitted background pre- 
dictions for each tag category in the current analysis and 
the S-J reanalysis. For the most sensitive double-tag cat- 
egories, the predicted (observed) event yields in the high- 
score NNsig region are 37.6±4.6 (37) for SS and 45.6±5.1 
(62) for SJ and 39.5 ±4.6 (33) for TT and 67.4 ±6.8 (80) 
for TL. While the SJ and TL categories exhibit similar 
upward fluctuations in data relative to expectations, the 
data in the SS (TT) category are consistent with (lower 
than) the background expectation. 

A simple test is performed in which 5 data events 
are added into the high-score region of the TT NNsig 
distribution (maintaining the relative fractions of ob- 
served events within each high-score bin) to approxi- 
mately match the expected background, as was observed 
in the SS category. This change reduces the difference 
between the present and S-J reanalyzed limits to 33%. 
Combining this effect with that of adding the 3 for- 
merly SJ-classified events into the TL category gives a 
decrease in observed limits of 19% relative to the S-J 
analysis. This is in reasonable agreement with the ex- 
pected improvement, identifying these two effects in data 
as the primary source of the change in observed limits at 
niH = 125 GeV/c2. 

To estimate the probability of an underlying statisti- 
cal effect causing such a sizable change in observed limits, 
correlations between the event samples must be under- 
stood. For technical reasons we are not able to deter- 
mine these correlations separately for each background 
process. Instead, we look directly at the data in the high- 
score regions of the NNsig discriminants, and calculate 
the percentage overlap between the tag categories of this 
analysis and those of the S-J reanalysis. The overlap 
percentages, relative to the current analysis, are given 
in Tabic [V] Based on these percentages, we use simu- 
lated data experiments to estimate the probability that 
the observed limits of this analysis and the S-J reanal- 
ysis are compatible. Figure [7] shows a two-dimensional 
distribution of expected upper limits, obtained from pro- 
ducing pairs of expected outcomes between the hobit 
analysis and S-J reanalysis. To calculate a compatibility 
probability (p- value) , the probability is estimated for the 
HOBIT analysis to be as or more discrepant that what is 
observed, given the observed limit of the S-J reanalysis. 
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tcntial background mismodeling, wc confirm the robust- 
ness of our background model in several data control 
samples. Events in the intermediate-score region of the 
NNsiG distributions are also useful for testing the back- 
ground modeling. We compare predicted and observed 
event yields in the NNsig score region between 0.5 and 
0.8, which contains higher event yields but is above the 
low-score event region, which drives the fitted normal- 
izations of the background contributions. Assuming a 
Higgs boson mass of 125 GeV/c^, the predicted (ob- 
served) event yields in the intermediate score NNgic re- 
gion are 228.8±21.0 (217) for SS and 312.5±22.6 (291) for 
SJ in the S-J reanalysis and 264.8±25.1 (265) for TT and 
506.1 ±38.8 (506) for TL in the current one. Good agree- 
ment between the observed and predicted event yields 
is found at the other Higgs boson mass assumptions as 
well. In the intermediate-score regions, there is thus no 
indication of a background modeling problem that could 
account for such sizable shifts in observed limits with re- 
spect to the S-J reanalysis. 



FIG. 7: Pseudoexperiment pairs of expected 95% C.L. upper 
limits on Higgs production assuming the hobit analysis (or- 
dinate) and S-J reanalysis (abscissa). For ease in p- value com- 
putation, the expected limits of the S-J reanalysis are rescaled 
such that the median-expected limit agrees with that of the 
HOBIT analysis. 



The two-sided probability for this type of occurrence at 
a Higgs boson mass of 125 GeV/c^ is roughly 7%. 

As a downward shift in observed limits is seen across 
the entire range of tested mn values and not just at 
niH = 125 GeV/c^, the probability for such a global 
shift to occur must be estimated. Limited experimental 
resolution of kinematic event input variables to the multi- 
variate discriminants leads to events being shared within 
the high-score NNsig regions of the outputs for neigh- 
boring mass hypotheses. Because of this, we estimate 
that the number of independent search regions within 
our tested Higgs boson mass range lies somewhere be- 
tween two and three. We therefore perform the pseudoex- 
periment study for three Higgs boson mass assumptions, 
obtaining p- values at uih = 100, 125 and 150 GeV/c^. 
Each p- value is on the order of 10%. To estimate an ap- 
proximate global probability, we combine the obtained p- 
values for the three Higgs boson mass assumptions using 
Fisher's method for combining independent tests. We ob- 
tain a global probability of roughly 3% or 5% depending 
on whether the number of independent kinematic search 
regions is three or two, respectively. 



3. Background modeling 

In order to conclude that the observed effect in data 
originates from statistical fiuctuations as opposed to po- 



C. Summary of discussion 

To summarize, the observed limits are very sensitive 
to statistical fluctuations in the highest-value bins of the 
NNgiG distributions. There is no evidence of any signif- 
icant mismodeling of the hobit 6-jet identification algo- 
rithm, or of the NNqcd or NNsig distributions and the 
distributions of their respective input variables in any of 
the control regions studied. The observed migration of 
events across the 5-tag categories is fairly consistent with 
expectations derived from simulation. In the most sensi- 
tive tag category, TT, the data yield is about 1 standard 
deviation below the background prediction in the signal 
region. Using an ensemble of simulated experiments, we 
estimate the probability that the observed limit could 
change, relative to the S-J reanalysis, by an amount at 
least as large as that observed due to statistical fluctu- 
ations alone is about 5%. We conclude that the change 
in the observed limits relative to the previous analysis is 
primarily due to statistical fiuctuations. 



VIII. CONCLUSION 

In conclusion, we have performed an updated Higgs bo- 
son search in the + bb final state, using the full CDF 
data set and an improved 6-tagging algorithm. With re- 
spect to the previous analysis Q, the expected 95% C.L. 
limits have improved by 14% on average across the Higgs 
boson mass range 90 < mn < 150 GeV/c^. The 95% ob- 
served upper hmit at a Higgs boson mass of 125 GeV/c^ 
is a factor of 3.06 times the SM prediction. The results 
of this analysis correspond to some of the most sensitive 
limits obtained on Higgs boson production in the bb final 
state. 
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